Semi-supervised Collaborative Text Classification

نویسندگان

  • Rong Jin
  • Ming Wu
  • Rahul Sukthankar
چکیده

Most text categorization methods require text content of documents that is often difficult to obtain. We consider “Collaborative Text Categorization”, where each document is represented by the feedback from a large number of users. Our study focuses on the semisupervised case in which one key challenge is that a significant number of users have not rated any labeled document. To address this problem, we examine several semi-supervised learning methods and our empirical study shows that collaborative text categorization is more effective than content-based text categorization and the manifold regularization is more effective than other state-of-the-art semi-supervised learning methods.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Combining Unigrams and Bigrams in Semi-Supervised Text Classification

Unlabeled documents vastly outnumber labeled documents in text classification. For this reason, semi-supervised learning is well suited to the task. Representing text as a combination of unigrams and bigrams has not shown consistent improvements compared to using unigrams in supervised text classification. Therefore, a natural question is whether this finding extends to semi-supervised learning...

متن کامل

A Novel Multi label Text Classification Model using Semi supervised learning

Automatic text categorization (ATC) is a prominent research area within Information retrieval. Through this paper a classification model for ATC in multi-label domain is discussed. We are proposing a new multi label text classification model for assigning more relevant set of categories to every input text document. Our model is greatly influenced by graph based framework and Semi supervised le...

متن کامل

Towards Multi Label Text Classification through Label Propagation

Classifying text data has been an active area of research for a long time. Text document is multifaceted object and often inherently ambiguous by nature. Multi-label learning deals with such ambiguous object. Classification of such ambiguous text objects often makes task of classifier difficult while assigning relevant classes to input document. Traditional single label and multi class text cla...

متن کامل

Semi-supervised Convolutional Neural Networks for Text Categorization via Region Embedding

This paper presents a new semi-supervised framework with convolutional neural networks (CNNs) for text categorization. Unlike the previous approaches that rely on word embeddings, our method learns embeddings of small text regions from unlabeled data for integration into a supervised CNN. The proposed scheme for embedding learning is based on the idea of two-view semi-supervised learning, which...

متن کامل

Improved Optimized Sentiment Classification On Dynamic Tweets

Real time Sentiment analysis is a subfield of Natural Language Processing concerned with the determination of opinion and subjectivity in a text, which has many applications. In this paper, classifiers for sentiment analysis of user opinion towards through comments and tweets sing Support Vector Machine (SVM) is described. The goal is to develop a classifier that performs sentiment analysis, by...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007